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(54) Multibus cached memory system 

(57) A method and apparatus for implementing a 
multibus cache memory system for use in computer sys- 
tems is disclosed utilizing a memory chip employing 
multiple distributed SRAM caches directly linked to a 
single DRAM main memory block. Each cache is directly 
linked to a different bus. Each chip further contains a 
partially distributed arbitration and control circuit for im- 
plementing cache policy and arbitrating memory refresh 
cycles. 
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Description 

BACKGROUND OF THE INVENTION 

1 . Field of Invention . 

The present invention relates to the field of compu- 
ter systems, more particularly the present invention re- 
lates to a method and apparatus for a multibused 
cached memory system for use in computer systems. 

2. Background of Invention . 

The classic pattern of improvement in computer 
technology over the past decade has been increases in 
performance taking place concurrently with decreasing 
prices. At the heart of these performance improvements 
has been the ability to speed up the throughput of the 
computer. In the classic architecture, a computer con- 
sists of a central processing unit (CPU) connected 
through a local bus to a memory unit. The processor and 
memory unit moved in a lock-step fashion with their ac- 
tions synchronized by a single clock. Fortunately, im- 
provements in the technology resulted in far faster proc- 
esses and far cheaper memory. A point of conflict arose, 
however, with the local bus architecture limiting the fast 
processor to the slower clock speeds required by the 
large, inexpensive memory. 

This bottleneck was first removed by IBM in 1984 
with the introduction of the PC-AT and later codified by 
the Industry Standard Architecture (ISA) in 1988. Under 
that architecture the CPU could run at a faster clock 
speed than memory. This performance improvement 
was achieved by providing two buses: one for the CPU 
known as a local bus and one for the rest of the system 
including memory known as a system bus. A drawback 
to this architecture was that the CPU managed all bus 
traffic. Therefore, during RAM updates from main mem- 
ory and data transfers between memory and peripheral 
devices, the CPU had to run at the slower system bus 
clock speed. In either of these cases, CPU performance 
was not optimal. 

The next step in performance improvement was 
embodied in the IBM Micro Channel Architecture (MCA) 
introduced in 1 988 on the PS/2 line of computers. In this 
architecture management of bus traffic as between the 
local and the system bus was handled by a special con- 
trol circuit rather than the CPU. This removed the CPU 
from full time bus management, thereby removing one 
of the constraints on the CPU's performance. 

Not, however, until the introduction of cache mem- 
ory was the CPU performance during memory accesses 
improved. This performance improvement was 
achieved by making a high speed, locally accessed 
copy of memory available to the CPU so that even dur- 
ing memory accesses the CPU would not always need 
to operate at the slower speeds of the system bus. This 
method of copying memory is referred to as caching a 
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memory system and is a technique made possible by 
virtue of the fact that much of the CPU access as deter- 
mined by the computer source code itself is in small, 
highly repetitive address spaces which once copied up 
5 from memory can be utilized through many bus cycles 
before needing to be refreshed with the next address 
block. This method of memory copying is advantageous 
on the read cycles of a computer which studies have 
shown, in contrast to the write cycles, constitute 90% of 
io the external accesses of a CPU. 

The most popular hardware realization of a cached 
memory incorporates SRAM cache and DRAM main 
memory. A proprietary Enhanced DRAM (EDRAM) chip 
developed by Ramtron International Corporation, 1850 
75 Ramtron Drive, Colorado Springs, Colorado 80921 , in- 
corporates both these memories on one chip. Access to 
the chip is provided by a single bus. The product line is 
described in that company's "Specialty Memory Prod- 
ucts Data Book/ October 1994, which is herein incor- 
porated by reference. 

The most popular expansion bus capable of proc- 
essor independent handling of multiple peripherals is 
the Peripheral Component Interconnect (PCI®) bus. 
The specifications for this bus are set forth in revision 
2.0 specification as provided by the PCI Special Interest 
Group, M/S HF3-15A, 5200 Northeast Elam Young 
Parkway, Hillsboro, Oregon 97124, which is herein in- 
corporated by reference. 

The problem which the prior art has not addressed 
and which needs to be addressed in order to further im- 
prove system performance is that a unified cache copy 
of main memory when shared between multiple CPU's 
or a single CPU and several buses results in cache cop- 
ies which at any point in time are not optimized for the 
traffic on any one master unit, be it a CPU or an external 
I/O device. This fact ends up degrading cache perform- 
ance. What is needed is a way to provide optimal cach- 
ing performance in a multibus, mutticlock, multiproces- 
sor environment. 

SUMMARY OF THE INVENTION 

The method and apparatus of the current invention 
relates to a multibus cache memory system for use in 
computer systems. The system employs distributed 
cache, cache arbitration and control. All caches are 
tightly coupled to main memory. In a tightly coupled 
memory array, different devices all have access and 
place competing demands on a unified main memory. 

The current invention provides cache memory dis- 
tributed by bus, by memory block, and by row within 
each cache within each memory block. 

Specifically disclosed is a memory chip containing 
multiple SRAM caches directly linked to a single DRAM 
memory block. Each chip further contains a partially dis- 
tributed arbitration and control circuit for implementing 
cache policy and arbitrating memory refresh cycles. 
Each cache on the chip is directly linked to either a local 
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or a PCI bus. Because each cache is dedicated to a spe- 
cific bus and/or device, the copy contained in cache is 
more likely to be relevant to the next read cycle of the 
device which is being serviced by that cache than would 
be the case if there were only one cache serving multiple 
devices. Thus system performance is improved. The 
cache system employs a modified write-through config- 
uration. Cache coherency is dealt with at a chip level 
rather than a bus level. Snooping is, therefore, not re- 
quired; rather a simple comparator circuit is disclosed 
for maintaining memory block-level cache coherency. 
Each chip is connected to a system level control and 
arbitration unit which determines, in the event of con- 
current demands from separate buses, which bus shall 
have priority access to a memory block. 

Broadly, disclosed herein is a process for controlling 
and arbitrating cache policy. The process disclosed ac- 
counts for: handling competing demands of different 
buses with different clocks, determining whether an ac- 
cess request is for memory or I/O, implementing read/ 
write hit and miss policy, maintaining distributed cache 
coherency for write hits, and managing DRAM refresh. 

Further disclosed herein is a computer including a 
Pentium® CPU operating on a local bus and a periph- 
eral device operating on a PCI bus. Each bus is tightly 
coupled to the distributed cache memory system. The 
arbitration and control circuit and policy for this preferred 
embodiment is set forth. 

BRIEF DESCRIPTION OF THE DRAWINGS 

A better understanding of the present invention can 
be obtained when the following detailed description of 
the preferred embodiment is considered in conjunction, 
with the following drawings in which: 

Fig. 1 is a block diagram of a computer incorporat- 
ing the present invention. 

Fig. 2 is a block diagram of a memory block unit 
incorporating distributed cache and cache arbitration 
and control circuits. 

Fig. 3A is a process flow diagram of the system level 
arbitration. 

Fig. 3B is a process flow diagram of the memory 
block-level arbitration for loading memory requests and 
making memory block-level main memory accesses. 

Fig. 3C is a process flow diagram of the memory 
block-level arbitration for processing read hits. 

Fig. 3D is a process flow diagram of the memory 
block-level arbitration for processing read misses and 
writes. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

A multibus, multiclock, multidevice, distributed 
cache memory system for use in computers is hereby 
disclosed. The system utilizes distributed cache in con- 
junction with distributed cache arbitration and control. 



In order to bridge the gap between fast processors, 
DMA devices, or PCI masters, and slow, tightly coupled 
memory, a cache memory is critical. Cache memory is 
a small amount of very fast memory that is used to store 

s a copy of frequently accessed code and data from main 
memory. Themicroprocessor(s), the PCI bus master(s), 
and the DMA device(s) can operate out of this very fast 
cache memory and thereby reduce the number of wait 
states that must be interposed during memory access. 

io The current invention calls for all buses, servicing 
CPU, DMA, or PCI devices, to share memory in a tightly 
coupled arrangement. The advantage of a tightly cou- 
pled arrangement is that each device has equal access 
to main memory. 

15 Cache thrashing would normally negate many of the 
benefits of a tightly coupled arrangement. By thrashing 
is meant the repeated insertion and removal of data from 
a cache. This is particularly likely to occur where more 
than one device, each with a different address require- 

20 ment, is sharing the same cache. Therefore, it is advan- 
tageous that cache be distributed with respect to the 
main memory. 
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Distributed Cache 



According to the current invention, it is advanta- 
geous to distribute cache along what may be considered 
three axis. The first axis will be called the bus axis, the 
second axis will be called the memory block axis, and 

30 the third axis will be called the cache row axis. It is ad- 
vantageous that there be at least one cache devoted to 
each bus. A bus may serve more than one device and 
each bus may run at a different clock speed. The dimen- 
sion of the first axis is, therefore, the number of buses. 

35 it is further advantageous that main memory, both 
for hardware and throughput reasons, be divided into 
memory address blocks and that each of these memory 
address blocks subdivide the cache devoted to each 
bus so that each memory block has at least one cache 

40 per bus. The dimension of the second axis is, therefore, 
the number of discrete memory blocks. If, for example, 
there are two buses and main memory is divided into 
two blocks, then each block would contain two caches, 
each devoted to a different bus, for a total of four caches. 

45 it is further advantageous that within each cache in 
a memory block there be multiple address rows. This is 
advantageous because the cache herein described may 
well be performing in an L2 relationship to any one of 
the devices on the bus which it is servicing. In this case, 

so it is advantageous that the cache so characterized as 
L2 be significantly larger than the onboard cache of the 
device that it may be servicing so as to enhance, by in- 
creasing the read hit rate, the performance of the device 
it is servicing. The dimension of the third axis is, there- 

55 fore, the number of address rows in each cache asso- 
ciated with a specific memory block and with a specific 
bus connected to that memory block. 
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Distributed Control and Arbitration 

Of equal importance to the distribution of cache is 
the distribution of cache control and/or arbitration. By 
cache control is meant the characterization of a cache 
request as either a read or a write and the subsequent 
implementation of appropriate policy in response to a hit 
or a miss. By cache arbitration is meant the resolution 
of the competing demands on cache as between multi- 
ple devices operating at multiple clock speeds in a mem- 
ory environment which may, independent of the access 
demands, also require intermittent memory refreshing 
as is the case with DRAM. Distribution of cache control 
and arbitration to the memory block-level, as opposed 
to the system level, allows for higher memory through- 
put. This latter feature is the result of the parallelism re- 
sulting from multiple control and arbitration decisions 
made simultaneously at a local level as opposed to se- 
rially at a global system controller level. 

The cache policies implemented herein are, with re- 
spect to reads, best characterized as a distributed, par- 
tially set-associative cache. By this is meant that each 
main memory location is mapped to the cache of a spe- 
cific memory block. 

The write policy is best described as a distributed 
write-through. By this is meani that a write request from 
any device is immediately satisfied by direct access to 
the appropriate main memory block. Coherency is main- 
tained through a set of simple comparator operations 
between those cache associated with a specific memory 
block. Cache coherency is the term given to the problem 
of assuring that the contents of cache memory and 
those of main memory for all caches in a memory block 
are either identical or under tight enough control that 
stale and current data are not confused with each other. 
The term "stale data" is used to describe data locations 
which no longer reflect the current value of the memory 
location they once represented. Therefore a write to 
cache will only update those caches which contain the 
same address as that being updated in main memory. 
A simple memory block-leve! comparator circuit can be 
used to maintain cache coherancy, because cache pol- 
icy calls for write-through to main memory and because 
architecturally cache is directly connected to each main 
memory block, rather than being separated from main 
memory by a bus. As a result, no bus level snooping is 
required to maintain cache coherency. 

Apparatus 

The apparatus of the present invention is most gen- 
erally shown in one preferred embodiment in Fig. 1 . Fig. 
1 is a system level block diagram of the popular Intel 
Pentium® CPU 20, operating on local bus 22 access to 
a distributed memory 24 comprising SRAM cache and 
DRAM main memory. The local bus comprises data line 
26, address line 28 and control line 30. A PCI controller 
32 and device 34 are also shown linked to distributed 
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cache and main memory 24 by a PCI bus 36. The PCI 
bus operates at a slower clock speed than the local bus 
22. The PCI bus comprises a multiplexed address and 
data line 38 and a control line 40. 
5 The three axis cache policy discussed above is rep- 
resented diagrammatically in Fig. 1 in the following man- 
ner. Memory is broken down into two blocks: Block A 42 
and Block B 44. Block A comprises an arbitration and 
control unit 46, a main memory 48, a cache 50 devoted 
10 to the local bus 22 and a cache 52 devoted to the PCI 
bus 36. Block B is similarly configured, comprising an 
arbitration and control unit 54, a main memory 56, a 
cache 58 devoted to the local bus and a cache 60 de- 
voted to the PCI bus. Within each cache there are two 
is address rows. Cache 50 includes rows 62, 64; cache 52 
includes rows 66, 68; cache 58 includes rows 70, 72; 
and cache 60 includes rows 74, 76. Therefore, the unit 
value for the bus axis is 2, the unit value for the memory 
block is 2, and the unit value for the cache row is 2. The 
number of distributed caches is the triple product of the 
unit numbers of each axis or in this case 2x2x2 = 8. 

As further shown in Fig. 1, a partially distributed 
cache arbitration and control policy is implemented in 
this preferred embodiment. In general, the more distrib- 
uted the cache arbitration and control, the faster the sys- 
tem throughput. In this particular preferred embodiment, 
a partially distributed arrangement has been shown in 
order to conserve memory block real estate. Partial ar- 
bitration and control occurs at a system level in system 
control and arbitration unit 78 which is used to arbitrate 
priority between concurrent requests; to determine 
whether a request for an address mapped to either 
memory or input/output is in fact for memory; and to con- 
trol which memory block is being addressed. Addition- 
ally, unit 78 can promote pipelining behavior in the mem- 
ory blocks. For example, if there are consecutive read 
requests on the CPU local bus that are directed to dif- 
ferent memory blocks then while the first request is be- 
ing output by one memory block to the local bus the sec- 
ond memory block can begin the processing of the next 
read request. These processes are shown in process 
flow diagram in Fig. 3A. The system control and arbitra- 
tion unit 78 is connected by address and command line 
80 and control tine 82 to memory block 42, and by ad- 
dress and command line 84 and control line 86 to mem- 
ory block 44. This replication of address and control 
lines between the system control and arbitration unit 78 
and the memory blocks allows for increased throughput 
by providing lor concurrent memory access to different 
blocks by each bus. 

The other portion of the arbitration and control en- 
vironment is in control and arbitration units 46 and 54 
associated respectively with Block A 42 and Block B 44. 
Generally these control cache read/write policy, and ar- 
bitrate between memory access and memory refresh as 
required by the volatile DRAM of main memory units 48, 
56. 

The hardware of one of these units, unit 42, is 
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shown in greater detail in Fig. 2. Unit 42 consists gen- 
erally of an arbitration and control section 46. Within ar- 
bitration and control section 46 of memory block A 42 
there are five basic hardware modules. 

Module 100 contains hardware for the uploading of s 
address and command from the system arbitration and 
control unit 78 to the local bus 104, 106 and PCI bus 
1 08, 1 1 0 address and command buffers. Module 1 00 al- 
so contains hardware for downloading of these buffers 
to main memory 48 when such access is required. io 

Module 150 contains the hardware for processing 
local bus read hits and misses, and module 200 contains 
the circuitry for handling PCI bus read hits and misses. 
Hardware module 250 contains the circuitry for handling 
a write hit or miss as requested by either the PCI or local is 
bus. Hardware module 300 contains the comparator cir- 
cuitry necessary to classify reads or writes as a hit or a 
miss. Hardware module 350 contains the main block 
memory, the distributed cache directories and data 52, 
50 for the PCI bus 36 and the local bus 22, and the re- 20 
fresh circuitry connected with maintaining a volatile 
DRAM main memory. The last hardware module, mod- 
ule 400, contains the input/output control and data latch- 
ing required for memory block 42. 

Referring specifically to hardware module 100, this 25 
module consists of demultiplexer 1 02, local bus address 
104 and command 108 buffers, PCI bus address 108 
and command 110 buffers, multiplexer 118 and multi- 
plexer drivers 112 and 114. Demultiplexer 102 is driven 
by control signals received from the system arbitration 30 
and control unit 78 over signal lines 82. Address and 
command is provided over signal lines 80. In response 
to a control signal on lines 82, an address and command 
available on lines 80 is directed by demultiplexer 1 02 to 
the appropriate buffers, either local 104, 106 or PCI 108, 35 
110. 

If narrow JEDEC® address architecture is used . 
then address and command line 80 will convey multi- 
plexed address information in the form of serially trans- 
mitted upper (row) and lower (column) address bytes. 40 
Upper bytes will be loaded by demultiplexer 1 02 into the 
upper bytes of the appropriate buffers either 104 or 108 
and lower bytes will be loaded by demultiplexer 102 into 
the lower bytes of the appropriate buffers either 104 or 
108. If only lower address bytes are transmitted then *s 
loading of buffers 104 or 108 will involve writing only to 
the lower bytes of the buffer leaving the upper bytes un- 
changed. This latter mode of addressing reduces the 
transmission time for JEDEC® standard address infor- 
mation involving consecutive address read/write re- so 
quests directed to different columns (lower bytes) within 
the same row (upper byte) of memory. A burst write in 
this context is a series of consecutive writes to the col- 
umns within the same row of memory. 

Multiplexer 1 1 8 is involved in the downloading of the 55 
contents of local bus buffer 104, 106 or PCI bus buffers 
108, 110 to main memory. The determination of which 
of these buffers to be downloaded to main memory, 
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when these buffers are to be downloaded to main mem- 
ory and if these buffers are to be downloaded to main 
memory is made by skip and inhibit multiplexer control 
circuits 112 and 114. Skip circuit 112 is responsible for 
determining which is the next bufferto be processed and 
for determining whether that buffer requires a memory 
download. If, for example, skip circuit 112 determines 
that the buffer eligible for downloading can be handled 
instead through cache as a read hit, then the download- 
ing of that buffer will be skipped. Once it has been de- 
termined by skip circuitry 1 1 2 which is the next buffer to 
be downloaded, then inhibit circuitry 114 determines 
when the appropriate time for that downloading is. If, for 
example, main memory is currently under a refresh or 
processing a prior download, then inhibit circuit 1 1 4 will 
prevent multiplexer signal 116 from enabling the multi- 
plexer in choosing which bufferto download until such 
time as the inhibiting factors have been removed. 

Signals emanating from hardware module 100 in- 
clude signal 120 which contains the command available 
in command buffer 1 06. This command, either a read or 
write, would have been initially obtained by the system 
arbitration and control unit off the W/R line on the local 
bus. Signal 122 contains the command, read/write, in 
the PCI bus command buffer 110. This signal was orig- 
inally derived from the PCI bus C/BE# multiplexed con- 
trol line as described in the ensuing discussion of PCI 
bus protocols. Signal line 124 contains the address 
downloaded from system arbitration and control unit 78 
and initially derived from the local bus A[31:3] address 
line. Signal line 126 contains the address downloaded 
from the system arbitration and control unit from the PCI 
bus and initially available on the multiplexed AD lines of 
that bus. Signal lines 128 and 130 contain respectively 
the address and command, read/write, downloaded to 
main memory by multiplexer 118 from either the local 
bus buffers 104, 106 or PCI bus buffers 108, 110. 

The next hardware module 1 50 contains the circuit- 
ry connected with processing read hits and misses orig- 
inating from the local bus. Transmission gate 152 im- 
poses a precondition on the processing of. a local bus 
buffer 106 read. NAND gate driver 154 accepting signal 
inputs 306 and 1 28 requires in order to enable the trans- 
mission gate 1 52 that there not be a write command be- 
ing processed at the memory level which requires up- 
dating of stale data found in local bus cache 50. This 
condition would be present, for example, when local bus 
cache comparator line 306 was active and when mem- 
ory level command line 1 30 was active indicating that a 
write was in process and local bus cache 50 contained 
stale data. 

When the disabling conditions are not found, trans- 
mission gate 152 would be enabled. The enabled output 
of transmission gate 152 is one input of AND gate 156. 
The other input is from local bus comparator 302 via sig- 
nal line 306. When AND gate 156 senses a read com- 
mand from the local bus buffer and a hit on line 306, it 
is active. Under these conditions, a read hit would be in 



EP 0 762 287 A1 



BNSDOCID: <EP 0762287A1_I_> 



5 



EP 0 762 287 A1 



order and all data is read from local bus cache 50. The 
enabled output of AND gate 156 would be presented to 
an input of OR gate 158 thereby driving burst interval 
countdown circuit 1 60, an output of which is burst enable 
signal 162. A burst read is defined to be the serial trans- 
fer of data packets (columns) from a row of cache. 

If, alternatively, a read hit is not detected on AND 
gate 1 56, then a read miss would be detected in circuits 
166. Circuit 166 detects the presence of a local bus buff- 
er read miss. If, for example, a read is present on ena- 
bled transmission gate 152 output and the AND gate 
1 56 has not detected a read hit, then a read miss is being 
processed. Skip circuitry 112 will identify local bus buff- 
ers 104, 106 as requiring a memory download. When 
the contents of local bus buffers 104, 106 have been 
copied to main memory, inhibit circuit 1 64 will sense that 
condition. This condition in conjunction with a read miss 
on lines 120 and 306 activates delay off circuit 166. 
When delay off circuit 166 is enabled, signals 172 and 
170 are active. Signal 172 will enable transmission gate 
360. Signal 170 will activate main memory 48. Under 
these conditions the requested address and data will be 
uploaded from main memory 48 to cache 50. At an ap- 
propriate interval after activation of circuit 166, delay on 
circuit 168 will enable its output thereby transmitting an 
active signal to an input of OR gate 158 which will drive 
burst interval countdown circuitry thereby emitting a 
burst enable signal 162. After initial loading of cache 50 
from main memory 48, all data is read in burst mode 
from cache 50. 

The next hardware module 200 is almost identical 
to that of hardware module 150 except for those differ- 
ences required by the difference in bus protocols on the 
PCI as opposed to the local bus. Transmission gate 202 
specifically imposes the precondition on PCI buffer read 
analysis that there not be a current memory write access 
which requires updating of the contents of PCI cache 
52. If this limiting condition is indicated, then lines 130 
and 308 serving respectively as the memory level com- 
mand status and PCI cache comparator inputs to AND 
gate 204 will be active. Under these conditions NAND 
gate will be inactive thereby disabling transmission gate 
202. 

When the limiting condition is removed from the in- 
puts of NAND gate 204 then transmission gate 202 will 
be enabled. Under these circumstances a PCI bus read 
hit will be detected by AND gate 206 when its input from 
PCI command buffer 110 indicates a read request and 
its signal input 308 from PCI cache comparator 304 in- 
dicates a hit. If this condition is detected, then AND gate 
206 sends an active signal to an input of OR gate 208 
which in turn sends an active signal to an input of AND 
gate 21 0. As long as an active output from OR gate 208 
is coupled with a active FRAME# signal from the PCI 
control bus 40, then the PCI burst mode signal emanat- 
ing from AND gate 210 will continue to be active. The 
PCI bus, unlike the local bus, supports burst modes of 
variable length. 
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If, alternatively, a read hit is not detected on AND 
gate 206, then a read miss would be detected in circuits 
21 8. Circuit 21 8 detects the presence of a PCI bus buffer 
read miss. If, for example, a read is present on enabled 
5 transmission gate 202 output and the AND gate 206 has 
not detected a read hit, then a read miss is being proc- 
essed. Skip circuitry 112 will identify PCI bus buffers 
108, 110 as requiring a memory download. When con- 
tents of the PCI bus buffers have been copied to main 

10 memory, inh ibit circuit 21 6 will sense that condition. That 
condition in conjunction with a read miss on lines 122 
and 308 activates delay off circuit 220. When delay off 
circuit 218 is enabled, signal 224 and 222 are active. 
Signal 224 will enable transmission gate 362. Signal 222 

15 will activate main memory 48. Under these conditions 
the requested address and data will be uploaded from 
main memory 48 to cache 52. At an appropriate interval 
after activation of circuit 218, delay on circuit 220 will 
enable its output thereby transmitting an active signal to 

20 an input of OR gate 208 which will enable an input of 
AND gate 210. As long as the FRAME* input to AND 
gate 210 is also active, a burst enable signal 21 4 will be 
present at the output of AND gate 214. 

The next hardware module 250 is connected with 

25 processing memory access writes. When a write signal 
is detected on line 130 by delay off circuitry 252, an ac- 
tive signal 258 is sent to the inputs of OR gate 352 there- 
by enabling access to main memory 48. If concurrently 
a write hit is detected on the local bus cache comparator 

30 302 by delay on circuit 254, then an active signal 172 
will be sent to transmission gate 360 enabling the cop- 
ying of new data from main memory 48 to replace the 
stale data in cache 50. If concurrently a write hit is de- 
tected by delay on circuit 256 with respect to PCI cache 

35 252, then an active signal 224 will be passed to trans- 
mission gate 362 thereby enabling the copying of data 
written to main memory to cache 52 to refresh stale data 
in the PCI cache. Thus, hardware module 250 handles 
the processing of write hits and misses. 

40 The next hardware module 300 is generally con- 
cerned with handling local bus cache comparisons and 
PCI bus cache comparisons by means of comparators 
302 and 304, respectively. Local bus cache comparator 
302 has two inputs: input 372 from an address portion 

^5 of cache 50 and input 31 8 from either transmission gate 
310 or transmission gate 314. PCI cache comparator 
304 has two inputs: specifically, signal 382 from an ad- 
dress portion of cache 52 and signal 320 from either 
transmission gate 312 or transmission gate 316. 

50 Transmission gates 310, 312 operate in unison; 
transmission gates 314, 316 operate in unison and in 
opposition to transmission gates 310, 312. When one 
pair is enabled, such as 310, 312, the other is disabled. 
Transmission gate 310 inputs are coupled to local bus 

55 address buffers 104; signal lines 124 and transmission 
gate 312 inputs are coupled to the PCI address buffers 
108 signal lines 126. Transmission gates 3t4 and 316 
inputs are coupled to address lines 128. 
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Transmission gates 310 and 312 are driven active, 
in other words enabled when control line 1 30 indicates 
a read. Alternately when control line 130 indicates a 
write, transmission gates 310 and 312 are disabled and 
transmission gates 314 and 31 6are enabled. This trans- 5 
mission gate circuitry is required in order to maintain 
cache coherency Durmqa write access to main memory 
48, caches 50 52 m memory block 42 are unified, and 
during a read access caches 50 52 are distributed. This 
ensures cache coherency amongst all caches in a given io 
memory block du-ing a wnic while still allowing for each 
cache to perform m a distributed manner when a read 
is being processed 

The next hardware module 350 contains main 
memory 46, loca bus and PCI bus caches 50 and 52, is 
respectively, and rct-osn orcurtry 354 Local bus cache 
50, data output 370 is drrven oy ourst column and latch 
circuitry 363. Circurt 36- »s programmed n the case of 
a Pentium® local bus lor burst generator in the Intel® 
interleaved burst mo'Jeiofmrtt When a ourst enable sig- 20 
nal 162 is detected <it the mp jtb of AND gate 364, and 
is coupled with CPU cock signal 3GG the output of AND 
gate 364 enables the circuitry in burst column and latch 
circuit 368. The output ot burst column and latch circuit 
368 causes specific oVit.i pockets contained in cache 50 2s 
to be output. 

Cache 52, the PCI cache lor this particular memory 
block, provides linear burst mooe data on signal lines 
380. The operation of cache 52 is determined by burst 
column and latch circuitry 384 This circuitry is prepro- 30 
grammed to a linear burst mode appropriate for the PCI 
bus protocol. The burst column and latch circuitry 384 
is activated when AND gate 374 detects the presence 
of PCI clock signal 366 a burst enable PCI signal 214, 
and the absence of a wait state introduced by the PCI 35 
bus master in the form, for example, of an IRDY# signal. 
When these conditions are detected, burst column and 
latch circuitry 384 is enabled by the output of AND gate 
374. The output of burst column and latch circuit 384 
causes specific data packets contained in cache 52 to *o 
be output in an PCI linear burst mode format on data 
lines 380. 

If wide DRAM memory architectures are being uti- 
lized in conjunction with relatively narrow buses, it is de- 
sirable that a read hit be processed commencing at var- 45 
ied locations within a given row of data in caches 50, 52. 
Data bursts could then originate at different locations 
within a row. In order to facilitate this capability, burst 
column and latch circuit 358 and 334 would contain a 
start column address buffer and latch which is independ- 50 
ently programmable over input lines 80 and 82. Under 
these circumstances the address portion of caches 50, 
52 would contain the most significant bits of an address 
corresponding to a row in DRAM main memory 48. 

An access to main memory 48 is enabled when any ss 
one of the inputs of OR gate 352 are active. Thus access 
to main memory is enabled when signal line 1 70 is active 
thereby indicating the processing of a local bus read 
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miss, when signal line 222 is active indicating the 
processing of a PCI bus read miss, or when signal line 
258 is active indicating the processing of a memory ac- 
cess write. When a memory access read miss is being 
processed as indicated by a read signal on main mem- 
ory access command input line 1 30, then depending on 
the bus requesting the read, either transmission gates 
360 or 362 will be enabled, thereby providing a path for 
copying address and data from main memory to the ap- 
propriate cache. 

Alternately, when a write is being processed by 
main memory as indicated by a write on line 130, then 
in the event of a write miss, neither transmission gates 
360 nor 362 will be enabled. In the event of a write hit, 
either one or both of transmission gates 360 or 362 will 
be enabled depending on which caches contain stale 
data that needs to be updated in order to maintain co- 
herency between caches 50, 52 and main memory 48. 
Both caches need to be updated only if both contain data 
for the same memory address as that being written to. 

The remaining portion of the circuitry connected 
with hardware module 350 is the refresh circuitry. Mul- 
tiplexer 356 provides access to main memory either for 
the refresh address provided by refresh circuitry 354 or 
for the memory access address available on lines 128. ^ 
Multiplexer control signal 358 is in traditional EDRAM 
parlance the IF signal. When signal line 358 is active,* 
refresh address generation circuitry 354 is enabled and 
row by row refreshing of main memory 48 is accom- 
plished. 

Alternately, when multiplexer control line 358 is not 
active, in other words, refresh is not in progress, then 
the address made available to main memory is that ad- 
dress present on memory access signal lines 128. 

The remaining hardware module 400 handles the 
input/output control and data latching connected with 
this memory block. Circuit 402 is the input/output control 
and data latching connected with the local bus, and cir- 
cuit 404 is the input/output control and data latching con- 
nected with the PCI bus. Control lines 30 provide the 
appropriate signals for local bus data management to 
circuit 402. For example, during a read cycle signal 
BRDY# would be available on a control line 30 to indi- 
cate the presence ot data available to be read on bus 
26. When no local bus memory access was directed to 
memory block 42, the circuit 402 would tristate its out- 
puts to local bus 26. 

The circuitry of unit 404 is a little more complicated 
due to the wait states that can be introduced on the PCI 
bus and the multiplexed protocol for handling both ad- 
dress and data. Therefore, circuitry in 404 must, through 
control lines 40, handle transmission and receipt hand- 
shakes with signals, TRDY#, IRDY#, and demultiplex- 
ing an AD line with signals DEVSEL and FRAME*. 

The circuitry of 402, 404 and in system arbitration 
and control unit 78 that is connected with bus protocol 
and data management is set lorth in the references The 
basic hardware, however, for handling distributed cache 
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and distributed arbitration and control have been set 
forth in Fig. 1-2 and, although many hardware realiza- 
tions of this distributive cache policy can be implement- 
ed, the central characteristics of distributed cache con- 
trol from a hardware perspective have been set forth. 
What remains is to describe from a process point of view 
the method of the current invention. 

Method 

Overall operation of the system is shown in Fig. 3A- 
D. These process flow diagrams reflect the basic con- 
siderations applicable to interfacing multiple buses with 
a tightly coupled and distributive^ cached memory. 
Each bus is characterized as communicating with mem- 
ory in terms of: clock, announcement, command, ad- 
dress, wait, data, and completion. Both buses in this pre- 
ferred embodiment operate in burst mode. Maximum 
burst duration for the PCI bus is assumed to have been 
initialized at the PCI bridge and burst mode interleaved/ 
linear is assumed to have been set at the burst mode 
generator in each memory block. 

The processing within system level arbitration and 
control unit 78 is shown in Fig. 3A. Commencing with 
start block 500, two parallel paths process announce- 
ments, commands, and addresses occurring on the lo- 
cal bus and the PCI bus. 

On the CPU local bus, the next clock 502 determi- 
nation leads to decision step 504 in which the local Pen- 
tium® bus is probed for traffic. If ADS# and a valid ad- 
dress are present, then there is an announcement on 
the local bus 22 and control is passed to decision block 
506. Alternately, if there is no announcement on the 
CPU bus, then control returns to process block 502 for 
a determination of the status of the local bus coinciding 
with the next clock cycle. When an announcement is de- 
tected in decision block 504, then control is passed to 
decision block 506 to determine if the bus traffic con- 
cerns a memory request or an IXO request as indicated 
by line M/IO#. If the announcement and request is not 
for memory, then control returns to process block 502. 
If, alternately, there is a request for a valid memory ad- 
dress, then in process block 508 the address on line A 
[31:3] and the command, read/write, on line CPUW/R# 
are stored in a system level local bus address and com- 
mand buffer in system arbitration and control unit 78. 

Next, in process 51 0 the address and the command 
are sent over either line 80 or line 84 as shown in Fig. 
1 , depending on the memory block to which they are 
directed. The address and command are stored in mem- 
ory block-level local bus buffers in the appropriate mem- 
ory block 42 or 44 as determined by control signals 82 
or 86. Subsequently, in process 512 the address that 
was stored in the system level local bus address buffer 
in step 508 is cleared. Then control returns to process 
502 for a determination of the next clock. 

On the PCI bus the clock runs slower than the local 
bus. Nevertheless processing of the PCI bus announce- 
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ments, commands, and addresses by the system level 
arbitration and control unit 78 is identical to that for the 
local bus except for issues of priority of access in the 
event of concurrent requests directed to the same mem- 
5 ory block. 

The PCI bus processing begins at the next clock 
514 determination. This leads to the determination in 
step 116 of bus traffic on the PCI bus 36. If FRAME# is 
asserted and a valid address AD are present, then there 

10 is an announcement and control is passed to decision 
block 518. Alternately, if there is no announcement on 
the PCI bus, then control returns to process block 514 
for a determination of the status of the bus coinciding 
with the next clock cycle. When an announcement is de- 

*5 tected by decision block 516, then control is passed to 
decision block 518 to determine if the PCI bus traffic 
concerns a memory request or an I/O request. If the an- 
nouncement and request is not for memory, then control 
returns to process block 514. If, alternately there is a 

20 request for a valid memory address, then in process 
block 520 the address on line AD and the command, 
read or write, on line C/BE# is stored in an address and 
command buffer respectively. 

At this point arbitration may be needed if a concur- 

25 rent request directed to the same memory block is being 
processed on the CPU path 502, 504, 506, 508. If con- 
current requests are for addresses contained in sepa- 
rate memory blocks, then they can be processed in par- 
allel by control lines 82, 86 servicing, respectively, mem- 

30 ory blocks 42 and 44 as shown in Fig. 1 . If, alternately, 
concurrent PCI and local bus requests are directed to 
addresses mapped to one memory block, then priority 
will need to be established. In the preferred embodi- 
ment, priority control is passed from process 520 to 

35 process 522 for a comparison of the addresses stored 
in process steps 508 and 520. Then in decision process 
524, a determination is made as to whether the address 
announced by the local bus 22 and the PCI bus 36 are 
mapped to the same memory block. If the decision is in 

40 the affirmative, then in process 526 a wait state is intro- 
duced in the processing of the PCI traffic, thereby giving 
priority to local bus traffic. It is desirable to give priority 
to the local bus because it is the faster bus and because 
its memory accesses are of a defined and shorter dura- 

45 tion than those of the PCI bus. After the wait state control 
passes to process 528. If, alternately, the result of deci- 
sion 524 is that the addresses being concurrently proc- 
essed are not mapped to the same memory block, then 
in process 528 the address and the command stored in 

50 step 520 are sent over either line 80 or line 84 as shown 
in Fig. 1 , depending on the memory block to which they 
are directed. The address and command are stored in 
memory block-level PCI buffers in the appropriate mem- 
ory block 42 or 44. 

55 Subsequently, in process 530 the address that was 

stored in the system level PCI address buffer in step 520 
is cleared. Then control returns to process 51 4 for a de- 
termination of the next clock. 
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It should be noted that in this one preferred embod- 
iment, arbitration of priority between concurrent re- 
quests, of mapping a request to memory or I/O, and of 
mapping a memory request to a specific memory block 
is being handled at a system level. It would be equally 
possible to handle these tasks at a memory block-level. 
This would increase system throughput and would re- 
quire that most of the circuitry connected with imple- 
menting the processes of Fig. 3A be duplicated in each 
memory block. Either approach may be suitable. 

Fig 3B is a process flow diagram of memory block- 
level processing of memory requests and main memory 
accesses occurring, possibly concurrently, in each 
memory block. The block-level local and PCI buffers 
have been loaded in steps 510 and 528, respectively. It 
remains to process those bufters. From start block 550 
control is passed to process block 552. In process 552, 
the aetermination of the next block-level buffer to be 
processed, and perhaps passed to block-level main 
memory, is commenced. Control is passed to decision 
554 in which the next buffers, in this case either block- 
level local bus or PCI bus, are analyzed. If null, then the 
current buffers have not been downloaded with an ad- 
dress or command and control returns to process 552 
for polling of the next buffers. If, alternately, the buffers 
contain address and control, then control passes to de- 
cision block 156 in which it is determined if the memory 
request can be processed as a read hit. If it can, then 
no main memory access is required and control returns 
to process 552. If, alternately, it is determined in process 
556 that the buffers' request cannot be handled as a 
read hit. for example, block-level main memory access 
is required, then control is passed to decision block 558. 

In decision 558 a determination is made as to 
whether the portion of main memory associated with this 
memory block is currently under an inhibit status which 
could, for example, be due either to a refresh of this 
block's main DRAM memory as indicated by the signal 
/F 358 and /RE 1 70, 258, 222 being active, or by a main 
memory access of this block as indicated by signal /RE 
being active. /RE will be active when either a block-level 
read miss or write is being handled. If either of these 
inhibits are taking place, then control passes to process 
560 for the introduction of a wait state after which control 
returns to decision 558. Alternately, once the inhibit is 
removed and/or if no inhibit is sensed, control passes 
directly from decision 558 to process 562 in which the 
block-level buffers, in this case either local or PCI, are 
presented to block-level main memory for processing. 
Subsequently, control passes to process 552. In Fig. 3B 
a cycle of memory block-level buffer analysis and block- 
level main memory downloading has been detailed. 

In Fig. 3C the processing of a read hit at a memory 
block-level for the local bus buffers is detailed. The proc- 
esses shown apply generally to each of the memory 
block-level bus buffers. In other words, the process 
shown in Fig. 3C occurs with respect to both the local 
and PCI bus memory block-level buffers. 



The process begins with start block 5B0 and control 
passes to process block 582 in which the loading of a 
new set of address and control requests into the local 
bus buffers is detected. Control is then passed to deci- 

s sion block 586 in which it is determined whether at the 
block main memory level a write condition is present, 
and further, at the local bus buffer level whether a hit 
condition has been detected, in which event control 
passes to process block 588 for the introduction of a wait 

10 state. This allows time for the current memory access 
to refresh stale cache data in the local bus cache within 
this memory block. 

Subsequently, after the introduction of the wait state 
in process block 586, control returns to decision block 

15 584 and proceeds in that 584, 586 loop until such time 
as the current memory access no longer involves a write 
condition coupled with a local bus cache write hit. Once 
. that condition has been removed, and irrespective of 
whether a memory access is taking place, control pro- 

20 ceeds to decision 588 in which a determination is made 
as to whether a read is requested by the memory block- 
level local bus buffers, and further whether the address 
to which the read is directed is currently contained in* 
memory block-level local bus cache. In the event that a 

25 read hit is not detected, then no read cycle independent 
of main memory access can take place and control re- 
turns to process block 582. Alternately, if a read hit con- 
dition is indicated, then in process 590 the burst mode 
is activated for the memory biock-levellocal bus cache. 

30 If the signal protocol on the local bus calls for a burst 
notification, that command is sent at this point. If this 
process loop were directed to the PCI bus, rather than 
the local bus, this burst notification would be evidenced 
by the TRDY# line being activated. 

35 Subsequently, in process block 592, to which con- 
trol has passed, a burst countdown is activated, which 
in this case is necessary, because Pentium® local bus 
burst cycles are of a predetermined length. This proc- 
ess, step 192, could be eliminated in the event of a PCI 

40 bus, in which case burst length is not fixed. In any event, 
control is passed to decision block 594 in which case a 
determination is made as to whether a wait state is re- 
quired. There are no wait states on the local bus, but if 
this process were for the PCI bus, for example, a deter- 

45 mination would be made as to whether IRDY# were as- 
serted. If asserted, indicating that the master wanted a 
wait state before receiving data, then control would pass 
to process block 596 for the introduction of a wait state 
and return to decision block 594 for determination of 

50 when a wait state was no longer present. Once the de- 
termination in decision block 594 is in the negative, in 
other words, that no wait state exists, then control pass- 
es to process block 596 in which the next data packet 
is transmitted from the memory block-level local bus 

55 cache. Control subsequently passes to decision block 
600 for determination as to whether or not the burst in- 
terval has ended. In the case of a Pentium® bus, this 
would involve determining whether the burst countdown 
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had reached a null state in which case the burst interval 
just engaged in was the last interval. In the case of a 
PCI bus, this would involve a determination as to wheth- 
er FRAME# had been deactivated, in which case the 
last block of data would be the last block involved in the 5 
burst interval. In the event these determinations were in 
the affirmative : in other words, that a burst interval had 
ended, then control would pass from decision block 600 
to process block 582. Alternately, if it had been deter- 
mined that the burst interval was not complete, then con- to 
trol would pass to decision block 594. This, then, com- 
pletes the processing of a basic read hit cycle applicable 
to any cache in any one of a number of memory blocks. 

In Fig. 3D, the processing of memory block requests 
requiring a main memory access, specifically a read is 
miss and a write hit/miss, is detailed. Beginning at start 
block 620, control is passed to process block 622 in 
which the passage 662 of the next buffers from the con- 
trol and arbitration unit of a given memory block to the 
main memory of thai block is indicated. At that point con- 20 
trol is passed to decision 624 for determination of wheth- 
er that specific memory access is a read miss access. 
The determination of a read miss is made by comparing 
only the cache directory dedicated to the bus from which 
the read request was received and the requested ad- 2s 
dress. The other caches of the memory block devoted 
to other buses are not involved in this determination. 

In the event a read miss is being processed, then 
control is passed to process 626 in which the appropri- 
ate cache servicing the specific bus from which the read 30 
request was received is enabled, and a copy of the data 
to be read is uploaded to the block-level cache for that 
bus from the main memory of that block. Control is sub- 
sequently passed to process block 628 in which, after 
the upload to the appropriate cache has taken place, 35 
main memory access is terminated. At that point, what 
had been a read miss is now processed similarly to a 
read hit: control is passed to process block 630 in which 
burst mode is activated; a burst notification, as men- 
tioned previously and if required by the bus protocol is 40 
sent; and a burst countdown in the case of a Pentium® 
bus read request is activated. After this process is com- 
pleted, control is passed to decision 632 for determina- 
tion as to whether a wart state has been introduced into 
the cache access. If, for example, the read access is 45 
from the PCI bus, then the assertion of IRDY# would 
indicate a wait state was required until the master was 
ready to receive the data available within the memory 
block-level PCI cache. If decision 632 is in the affirma- 
tive, then control is passed to process block 634 for the so 
introduction of a wait state and subsequent return to de- 
cision block 632. Once the removal of a wait state or 
absence of a wait state is detected, control is passed to 
process block 636 for transmission of the next data 
burst. 55 

Then control is passed to decision block 638 in 
which it is determined whether the burst interval is over. 
In the case of the local bus, this would be indicated by 
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the null state of the burst countdown; in the case of the 
PCI bus this would be indicated by the deactivation of 
FRAME#. In the event the determination is in the neg- 
ative, for example, that the burst cycle is not completed, 
then control returns to decision block 632. Alternately, if 
it is determined that the burst interval has terminated, 
then control is passed to process block 622 for retrieval 
of the next buffers requiring access to block-level main 
memory. 

Alternately, if in decision block 624 a read miss is 
not indicated, then the block-level main memory access 
taking place is a write cycle 640. Control then is passed 
to process 642 in which all cache comparators in a spe- 
cific memory block are linked and perform as a unified 
rather than distributed cache. The comparison being 
made by these comparators is between all cache direc- 
tories in the memory block, in this case a local bus and 
PCI bus cache, and the address present in main mem- 
ory. This is required to assure that all stale data irrespec- 
tive of the cache in which it is found is updated during 
this write hit. 

Subsequently, control is passed to decision block 
644 for determination of whether a local bus write cache 
hit is detected. In the event the address written to in 
block-level main memory is also contained in the local 
bus cache associated with that memory block, then con- 
trol is passed to process block 646, in which case the 
transmission gate to the local bus cache is enabled and 
a copy of the new data just written to main memory is 
also copied from main memory to the appropriate row 
in the local bus cache. Control is then returned to deci- 
sion block 648. Alternately, if there is not a local bus 
cache write hit as determined in decision block 644, then 
control is also passed to decision block 648. This is re- 
quired because on a memory write both the local and 
PCI caches may contain identical addresses requiring 
updating of stale data. 

In decision block 648, determination is made as to 
whether a hit has been experienced in the memory 
block-level PCI cache. In the event that it has, control is 
passed to process block 650 for subsequent uploading 
of a copy of the data just written to main memory to the 
PCI cache. Control is then passed to process block 652. 
Alternately, if a determination 648 is made that there is 
no PCI cache write hit, then control is passed also to 
process block 652 for terminating main memory access 
and for uncoupling the comparators of the memory block 
and returning them to their normal separated state. In 
this state each comparator compares a memory block- 
level cache directory with the address present in the 
memory block-level address buffer associated with that 
bus. 

This then completes the basic distributive arbitra- 
tion and control processing associated with this partic- 
ular preferred embodiment of the present invention. 

As compared to an L2 cache serving all of main 
memory, the current distributive cache and cache arbi- 
tration and control exhibits the following advantages. 
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First, any memory requests on the local and PCI bus, 
including dual write hits, can be processed concurrently 
in the current system, provided that the hits occur in dif- 
ferent memory blocks. Any memory requests in an L2 
cache are handled serially. Second, of those memory s 
requests on the local and PCI buses which are directed 
to a single memory block, the following can be per- 
formed concurrently in the current system: dual read 
hits, a read miss and a read hit, a read hit and a write 
miss, and a read hit and a write hit; provided in the latter to 
case the hit is not in the cache involved in the read hit. 
By contrast, in an L2 cache all memory requests are 
handled serially. 

Signals *5 

A more complete appreciation of process diagrams 
of Fig. 3A-D will be had by reference to the following 
discussion of bus level and cache level signals. 

20 

PCI 

Address and Data Pins 

AD[31 :0] Address and Data are multiplexed on the 2s 
same PCI pins. A bus transaction consists of an 
address 3 phase followed by one or more data phases. 
PCI supports both read and write bursts. The address 
phase is the clock cycle in which FRAME* is asserted. 
During the address phase AD[31:0] contain a physical 30 
address (32 bits). For I/O, this is a byte address; for con- 
figuration and memory it is a DWORD address. During 
data phases AD[07:0] contain the least significant byte 
(Isb) and AD[31:24] contain the most significant byte 
(msb). Write data is stable and valid when IRDY# is as- 35 
serted and read data is stable and valid when TRDY# 
is asserted. Data is transferred during those clocks 
where both IRDY# and TRDY# are asserted. 

C/BE[3:0]# Bus Command and Byte Enables are 
multiplexed on the same PCt pins. During the address 40 
phase of a transaction, C/BE[3:0]# define the bus com- 
mand (refer to Section 3.1. for bus command defini- 
tions). 

Du ring the data phase C/BE[3:0]# are used as Byte 
Enables. The Byte Enables are valid for the entire data 45 
phase and determine which byte lanes carry meaningful 
data. C/BE[0]# applies to byte 0 (Isb) and C/BE[3]# ap- 
plies to byte 3 (msb). 

FRAME* Cycle Frame is driven by the current mas- 
ter to indicate the beginning and duration of an access. 50 
FRAME* is asserted to indicate a bus transaction is be- 
ginning. While FRAME* is asserted, data transfers con- 
tinue. When FRAME* is deasserted, the transaction is 
in the final data phase. 

IRDY* Initiator Ready indicates the initiating 55 
agent's (bus master's) ability to complete the current da- 
ta phase of the transaction. IRDY* is used in conjunction 
with TRDY*. A data phase is completed on any clock 
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both IRDY* and TRDY* are sampled asserted. During 
a write, IRDY* indicates that valid data is present on AD 
[31 :0]. During a read, it indicates the master is prepared 
to accept data. Wait cycles are inserted until both IRDY* 
and TRDY* are asserted together 

TRDY* Target Ready indicates the target agent's 
(selected device's) ability to complete the current data 
phase of the transaction. TRDY* is used in conjunction 
with IRDY*. A data phase is completed on any clock 
both TRDY* and IRDY* are sampled asserted. During 
a read, TRDY* indicates that valid data is present on 
AD[31 ::00]. During a write, it indicates the target is pre- 
pared to accept data. Wait cycles are inserted until both 
IRDY* and TRDY* are asserted together. 

STOP* Stop indicates the current target is request- 
ing the master to stop the current transaction. 

DEVSEL* Device Select when actively driven, in- 
dicates the driving device has decoded its address as 
the target of the current access. As an input, it indicates 
whether any device on the bus has been selected. 

Pentium© Local Bus 

Address and Data Pins 

BRDY* Burst Ready allows the CPU to insert wait 
states as necessary to meet the memory subsystems 
timing requirements. 

A[31 :3] Address lines for the Pentium®. 

D[63:0] Data lines for the Pentium®. 

ADS* Address Strobe indicates the availability of a 
valid address on the local bus. 

NA Next Address allows for pipelining address and 
data commands by signaling the processor that, al-~ 
though the data portion of the previous command is still 
being processed, the address portion is complete and 
a new address for the next operation can be made avail- 
able on the bus. 



Claims 

1. In a computer system including; a plurality of buses, 
each bus serving at least one device, and each bus 
serving to communicate direct memory access 
read/write requests, data, and main memory ad- 
dresses, from a device to a main memory which is 
tightly coupled to the devices; a distributed cache 
memory system, comprising: 

a first group of one or more devices requiring 
direct memory access; 

a second group of one or more devices requir- 
ing direct memory access; 

a first bus connecting said first group of devices 
and the main memory; 
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a first cache connected directly to said first bus 
and to the main memory; 



ting said data from said first cache to said first 
bus; 



a second bus connecting said second group of 
devices and the main memory; 

a second cache connected directly to said sec- 
ond bus and to the main memory; 

a cache memory control and arbitration unit 
coupled to said first and second cache, said 
first and second bus, and to said main memory 
for processing direct memory access requests 
from said first and said second bus. 

The distributed cache memory system as set forth 
in claim 1 , wherein: 

said cache memory control and arbitration 
unit couples said first and second cache during a 
wnIc request and uncouples said first and second 
cache during a read request. 

The distributed cache memory system as set forth 
in claim 2, wherein: 

said cache memory control and arbitration unit 
includes means for copying data written to an 
address located only in said main memory to 
said main memory; 

said cache memory control and arbitration unit 
includes means for copying data written to an 
address located in said main memory and said 
first cache, to said main memory and said first 
cache: 

said cache memory control and arbitration unit 
includes means for copying data written to an 
address located in said main memory and said 
second cache, to said main memory and said 
second cache; and 

said cache memory control and arbitration unit 
includes means for copying data written to an 
address located in said main memory, said first 
cache and said second cache, to said main 
memory, said first cache and said second 
cache. 

The distributed cache memory system as set forth 
in claim 2, wherein: 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said first bus and di- 
rected to an address located only in the main 
memory by copying said address and data from 
main memory to said first cache and transmit- 
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said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said first bus and di- 
rected to an address and data located both in 
the main memory and in said first cache by 
transmitting said data from said first cache to 
said first bus; 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said second bus and 
directed to an address located only in the main 
memory by copying said address and data from 
main memory to said second cache and trans- 
mitting said data from said second cache to 
said second bus; and 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said second bus and 
directed to an address and data located both in 
the main memory and in said second cache by 
transmitting said data from said second cache 
to second bus. 

The distributed cache memory system as set forth 
in claim 1 , wherein: 

said cache memory control and arbitration 
unit includes means for determining priority for 
processing direct memory access requests from 
said first and said second bus. 

The distributed cache memory system as set forth 
in claim 1, wherein: 

said cache memory control and arbitration 
unit includes means for pipelining memory access 
requests. 

The distributed cache memory system as set forth 
in claim 1, wherein: 

said cache memory control and arbitration 
unit includes means for handling burst read re- 
quests. 

The distributed cache memory system as set forth 
in claim 1 , wherein: 

said cache memory control and arbitration 
unit includes means for handling burst write re- 
quests. 

In a computer system including a first bus connect- 
ed to a first group of one or more devices, a second 
bus connected to a second group of one or more 
devices, a main memory controller connected to the 
first and second bus, and a main memory element 
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connected to the main memory controller and the 
first and second bus; an improved main memory el- 
ement, comprising: 

a first cache connected directly to said first bus $ 
and to the main memory element; 

a second cache connected directly to said sec- 
ond bus and to the main memory element; 

10 

a cache memory control and arbitration unit 
coupled to said first and second cache, said 
first and second bus, the main memory element 
and to the main memory controller for process- 
ing direct memory access requests from the is 
first and the second bus. 

10. The improved main memory element as set forth in 
claim 9, wherein: 

said cache memory control and arbitration 20 
unit couples said first and second cache during a 
write request and uncouples said first and second 
cache during a read request. 

11. The improved main memory element as set forth in 25 
claim 10, wherein: 

said cache memory control and arbitration unit 
includes means for copying data written to an 
address located only in the main memory ele- 30 
ment to the main memory element; 

said cache memory control and arbitration unit 
includes means for copying data written to an 
address located in the main memory element 35 
and said first cache, to the main memory ele- 
ment and said first cache; 

said cache memory control and arbitration unit 
includes means for copying data written to an *o 
address located in the main memory element 
and said second cache, to the main memory el- 
ement and said second cache; and 



said cache memory control and arbitration unit 45 
includes means for copying data written to an 
address located in the main memory element, 
said first cache and said second cache, to the 
main memory element, said first cache and said 
second cache. so 16. 

12. The improved main memory element as set forth in 
claim 10, wherein: 



memory element by copying said address and 
data from the main memory element to said first 
cache and transmitting said data from said first 
cache to said first bus: 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said first bus and di- 
rected to an address and data located both in 
the main memory element and in said first 
cache by transmitting said data from said first 
cache to said first bus; 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said second bus and 
directed to an address located only in the main 
memory element by copying said address and 
data from the main memory element to said 
second cache and transmitting said data from 
said second cache to said second bus; and 

said cache memory control and arbitration^ unit 
includes means for responding to a read re- 
quest communicated on said second bus and 
directed to an address and data located both in 
the main memory element and in said second 
cache by transmitting said data from said sec- 
ond cache to second bus. 

13. The improved main memory element as set forth in 
claim 9, wherein: 

said cache memory control and arbitration 
unit includes means for determining priority for 
processing direct memory access requests from 
said first and said second bus. 



said cache memory control and arbitration unit ss 
includes means for responding to a read re- 
quest communicated on said first bus and di- 
rected to an address located only in the main 



14. The improved main memory element as set forth in 
claim 9, wherein: 

said cache memory control and arbitration 
unit includes means for pipelining memory access 
requests. 

15. The improved main memory element as set forth in 
claim 9, wherein: 

said cache memory control and arbitration 
unit includes means for handling burst read re- 
quests. 

The improved main memory element as set forth in 
claim 9, wherein: 

said cache memory control and arbitration 
unit includes means for handling burst write re- 
quests. 



17. In a computer system including a first bus connect- 
ed to a first group of one or more devices, a second 
bus connected to a second group of one or more 
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devices, and a main memory connected to the first 
and second bus; an improved main memory, com- 
prising: 

a main memory arbitration and control unit con- 
nected to said first and second bus; 

at least one memory element connected to said 
main memory arbitration and control unit and to 
said first and said second bus said memory el- 
ement containing; 

a) a main memory element; 

b) a first cache connected directly to said 
first bus and to said main memory element; 
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address located in said main memory element, 
said first cache and said second cache, to said 
main memory element, said first cache and said 
second cache. 

20. The improved main memory as set forth in claim 1 8, 
wherein: 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said first bus and di- 
rected to an address located only in the main 
memory element by copying said address and 
data from main memory element to said first 
cache and transmitting said data from said first 
cache to said first bus; 



c) a second cache connected directly to 
said second bus and to said main memory 
element; 20 

d) a cache memory control and arbitration 
unit coupled to said first and second cache, 
said first and second bus, said main mem- 
ory element and to said main memory ar- 25 
bitration and control unit for processing di- 
rect memory access requests from the first 
and the second bus. 

18. The improved main memory as set forth in claim 17, 30 
wherein: 

said cache memory control and arbitration 
unit couples said first and second cache during a 
write request and uncouples said first and second 
cache during a read request. 35 

19. The improved main memory as set forth in claim 1 8, 
wherein: 

said cache memory control and arbitration unit 40 
includes means for copying data written to an 
address located only in said main memory ele- 
ment to said main memory element; 

said cache memory control and arbitration unit 45 
includes means for copying data written to an 
address located in said main memory element 
and said first cache, to said main memory ele- 
ment and said first cache; 

50 

said cache memory control and arbitration unit 
includes means for copying data written to an 
address located in said main memory element 
and said second cache, to said main memory 
element and said second cache; and 55 

said cache memory control and arbitration unit 
includes means for copying data written to an 



said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said first bus and di- 
rected to an address and data located both in 
the main memory element and in said first 
cache by transmitting said data from said first 
cache to said first bus; 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said second bus and 
directed to an address located only in the main 
memory element by copying said address and 
data from the main memory element to said 
second cache and transmitting said data from 
said second cache to said second bus; and 

said cache memory control and arbitration unit 
includes means for responding to a read re- 
quest communicated on said second bus and 
directed to an address and data located both in 
the main memory element and in said second 
cache by transmitting said data from said sec- 
ond cache to second bus. 

21 . The improved main memory as set forth in claim 1 7, 
wherein: 

said cache memory control and arbitration 
unit includes means for determining priority for 
processing direct memory access requests from 
said first and said second bus. 

22. The improved main memory element as set forth in 
claim 17, wherein; 

said cache memory control and arbitration 
unit includes means lor pipelining memory access 
requests. 

23. The improved main memory element as set forth in 
claim 17, wherein: 

said cache memory control and arbitration 
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unit includes means for handling burst read re- 
quests 

24. The improved main memory element as set forth in 
claim 17, wherein: s 

said cache memory control and arbitration 
unit includes means for handling burst write re- 
quests. 

25. In a computer system including; a first bus connect- 10 
ed to a first group of one or more devices, a second 
bus connected to a second group of one or more 
devices, a main memory shared by and connected 

to the first and second bus and a cache directly con- 
nected to main memory; a method for handling read is 
memory access requests comprising the steps of; 

separating the cache directly connected to 
main memory into a first bus cache portion di- 
rectly connected to said first bus and to the 20 
main memory and a second bus cache portion 
directly connected to said second bus and to 
the main memory; 

detecting an initial first bus read data at address 2s 
request from said first bus and copying said re- 
quested initial first bus read data from the main 
memory to said first bus cache and transmitting 
said copied initial first bus read data from said 
first bus cache to said first bus; 30 

detecting a next first bus read data at address 
request from said first bus; and 



said address in said first bus cache if said first 
bus cache contains said address; and 

b) determining whether said second bus cache 
contains said address and writing said data to 
said address in said second bus cache if said 
second bus cache contains said address. 

27. In a computer system including; a first bus connect- 
ed to a first group of one or more devices, a second 
bus connected to a second group of one or more 
devices, a main memory shared by and connected 
to the first and second bus and a cache directly con- 
nected to main memory and to the first and second 
bus; a method for handling write memory access 
requests comprising the steps of; 

separating the cache directly connected to 
main memory into a first bus cache portion di- 
rectly connected to said first bus and to the 
main memory and a second bus cache portion 
directly connected to said second bus and to 
the main memory; 

detecting a write data request directed to an ad- 
dress in the main memory and writing data to 
said address of the main memory; and 

a) determining whether said first bus cache 
contains said address and writing said data 
to said address in said first bus cache if 
said first bus cache contains said address; 
and 
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a) transmitting said requested next first bus 
read data from said first bus cache to said 
first bus if a copy ol said data is available 
in said first bus cache at the time of said 
request; and 

b) copying said requested next first bus 
read data from the main memory to said 
first bus cache and transmitting said next 
first bus read data from said first bus cache 
to said first bus if said data is not available 
in said first bus cache at the time of said 
request. 

The method for handling read memory access re- 
quests of claim 25 which further comprises the 
method for handling write requests comprising the 
steps of; 

detecting a write data request directed to an 
address in the main memory and writing data to said 
address of the main memory; and 

a) determining whether said first bus cache 
contains said address and writing said data to 
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b) determining whether said second bus 
cache contains said address and writing 
said data to said address in said second 
bus cache if said second bus cache con- 
tains said address. 

28. The method for handling write memory access re- 
quests of claim 27 which further comprises the 
method for handling read requests comprising the 
steps of; 

detecting an initial first bus read data at address 
request from said first bus and copying said re- 
quested initial first bus read data from the main 
memory to said first bus cache and transmitting 
said copied initial first bus read data from said 
first bus cache to said first bus; 

detecting a next first bus read data at address 
request from said first bus; and 

a) transmitting said requested next first bus 
read data from said first bus cache to said 
first bus if a copy of said next first bus data 
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is available in said first bus cache at the 
time of said next first bus request; and 

b) copying said requested next first bus 
read data from the main memory to said 5 
first bus cache and transmitting said next 
first bus read data from said first bus cache 
to said first bus if said next first bus data is 
not available in said first bus cache at the 
time of said next first bus request. 10 

29. A memory system for a computer, the memory sys- 
tem comprising a main memory, a plurality of cache 
memories each coupled to the main memory for in- 
formation exchange therebetween and each cou- is 
pled to a respective bus or device so that data de- 
rived from the main memory and stored in the cache 
memory can be delivered to the respective bus or 
device, and a control unit for co-ordinating the op- 
eration of the cache memories. 20 
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